feat(docker): optimize concurrency performance and memory management by mzyfree · Pull Request #1689 · unclecode/crawl4ai

mzyfree · 2026-01-04T01:29:17Z

Summary

This PR introduces a comprehensive optimization suite for crawl4ai in high-concurrency Docker environments. It focuses on improving QPS (Queries Per Second) and ensuring long-term memory stability by re-engineering the browser pooling mechanism and introducing optional resource filtering.

Key Design Principle: All new features are opt-in. By default, the system behaves exactly as before, ensuring zero impact on existing community users.

Core Enhancements:

Tiered Browser Pooling: Replaced the basic pool with a tiered system (Hot, Cold, Retired) to better manage instance lifecycles.
Reference Counting: Implemented active_requests tracking to prevent browsers from being closed while still processing requests, fixing common "Target closed" errors under load.
Aggressive Memory Management: Added a browser retirement mechanism that recycles instances after a certain number of uses or when system memory pressure is high.
Resource Optimization: Added optional blocking for CSS and ad-related network requests, significantly reducing the memory footprint per instance.
Observability: Added optional pool audit logs for real-time monitoring of browser health and usage.

New Configuration Options

These new features can be enabled via BrowserConfig or Environment Variables:

Engine Layer (`BrowserConfig`)

avoid_ads (bool, default: False): Enable intercepting and blocking ad/tracker network requests.
avoid_css (bool, default: False): Enable blocking CSS resource loading to save CPU/Memory.

Docker Layer (Environment Variables)

CRAWL4AI_BROWSER_RETIREMENT_ENABLED (default: false): Enable the usage/memory-based retirement mechanism.
CRAWL4AI_PERMANENT_BROWSER_DISABLED (default: false): If true, disables the always-on permanent browser instance.
CRAWL4AI_POOL_AUDIT_ENABLED (default: false): Enable detailed pool status logging every 5 minutes.
CRAWL4AI_BROWSER_MAX_USAGE (default: 100): Max requests per instance before retirement.
CRAWL4AI_MEMORY_RETIRE_THRESHOLD (default: 75): System memory % to trigger aggressive retirement.

List of files changed and why

crawl4ai/async_configs.py: Added new parameters to BrowserConfig.
crawl4ai/browser_manager.py: Implemented the network interception logic for resource filtering.
deploy/docker/crawler_pool.py: Implemented the tiered pool, retirement, and audit logic.
deploy/docker/api.py & deploy/docker/server.py: Updated with try...finally for accurate reference counting.

How Has This Been Tested?

Stress Testing: Performed high-concurrency load tests (50+ concurrent requests) on a Kubernetes cluster. Observed a significant increase in sustained QPS without OOM issues.
Memory Stability: Verified that the "Retirement" and "Janitor" logic successfully reclaims memory during and after high-load periods.
Backward Compatibility: Confirmed that the system remains stable and identical in behavior when all new toggles are set to their default (False) values.

Checklist:

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have added/updated unit tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Stress test performance

QPS increased by 40%

Resource with no OOM

This commit consolidates several optimizations for crawl4ai in high-concurrency environments: 1. Browser Pool Optimization: - Implemented a tiered browser pool (Hot, Cold, Retired). - Added a browser retirement mechanism based on usage count (MAX_USAGE_COUNT) and memory pressure (MEMORY_RETIRE_THRESHOLD). - Added reference counting (active_requests) to ensure browser instances are not closed while in use. - Enhanced the pool janitor with adaptive cleanup intervals based on system memory. 2. Resource Loading Optimization: - Integrated optional CSS and Ad blocking to reduce memory footprint and improve QPS. - Decoupled resource filtering from text_mode to allow granular control. 3. Stability and Scalability: - Added mandatory release_crawler calls in API/Server handlers to prevent resource leaks. - Introduced environment variables to toggle these new features (defaulting to False for safe community adoption). - Added optional 5-minute pool audit logs for better observability. Co-authored-by: dylan.min <dylan.min@example.com>

…eanup docs - Refactor BrowserManager to dynamically block resources based on avoid_css and text_mode - Align text_mode behavior with community standards (no forced CSS blocking) - Add Top 20 curated ad and tracker patterns for performance - Restore and translate permanent browser logs in crawler_pool.py - Clean up models.py schema annotations and server.py docstrings - Add unit and functional tests for filtering flags

mzyfree · 2026-01-08T09:03:59Z

@unclecode @ntohidi please review this MR

chrizzly2309 · 2026-01-09T06:51:47Z

@mzyfree +1, better supported in high concurrence envs is needed

AlbertInRC · 2026-01-13T03:26:39Z

Look forward to it as well! The current performance is very POOR. The QPS is <1 for 2 CPU + 4GB RAM, tried to fetch 3 URLs in one request

AlbertInRC · 2026-01-13T03:29:26Z

@ntohidi @aravindkarnam Pls help...

AlbertInRC · 2026-01-15T00:45:57Z

Any update pls?

AlbertInRC · 2026-01-30T05:16:02Z

Anyone is looking at this issue pls?

AlbertInRC · 2026-02-24T01:42:50Z

@unclecode @ntohidi @ara
Could you pls help take a look? We are facing the performance issue as well. The temp solution is to use this PR.

unclecode · 2026-02-24T02:54:21Z

@mzyfree Thx for this PR, u have done a good job here, I will review this soon. Sorry for later reply, been very busy preparing our hosted platform for Crawl4ai. @AlbertInRC thx mentioning me on this.

Resolve conflicts in async_configs and docker server while keeping avoid_ads/avoid_css and upstream init_scripts, and retaining upstream URL scheme validation.

mzyfree · 2026-02-24T06:05:45Z

@unclecode

Thanks for the update! Totally understand — appreciate you taking the time to review.
Please let me know if you’d like me to adjust anything or add more benchmarks/test results.
I'm really looking forward to trying out your cloud-hosted version.

unclecode · 2026-02-25T06:41:14Z

@mzyfree Thanks for this excellent PR - the analysis of pool-level resource leaks and the avoid_ads/avoid_css idea were spot-on.

We've been doing a lot of internal refactoring on the browser manager and pool layers recently, so rather than merging this directly (it would need significant rebasing), we've implemented the core ideas from your PR ourselves, adapted to the current codebase:

avoid_ads / avoid_css BrowserConfig flags - opt-in resource filtering with ad/tracker domain blocking and CSS route blocking
release_crawler() + active_requests tracking - proper pool-level lifecycle management so the janitor doesn't close browsers with in-flight requests
finally blocks in all API/server handlers - fixing the resource leak you identified

We intentionally left out the browser retirement mechanism for now since it overlaps with our existing max_pages_before_recycle at the context level - we want to design a unified approach rather than having two competing recycling mechanisms.

These changes are already pushed to the develop branch and will be available in the next version release. We'll be adding your name to the CONTRIBUTORS file as well. Really solid work here - the stress testing data and the tiered pool analysis were very helpful in validating the approach. Closing this PR in favor of the merged implementation, but please keep the contributions coming - this kind of deep performance analysis is exactly what the project needs! You may close this PR.

…ecycle Add opt-in BrowserConfig flags (avoid_ads, avoid_css) for blocking ad/tracker domains and CSS resources at the browser context level. Refactor crawler pool with release_crawler() and active_requests tracking to prevent janitor from closing browsers with in-flight requests. Add proper finally blocks to all Docker API/server handlers. Update docs for new config options. Inspired by #1689.

unclecode · 2026-03-07T05:29:35Z

Thanks for the contribution! This fix has already been implemented on the develop branch via a different approach and will be in the next release. Closing as superseded - appreciate your effort!

dylan.min and others added 2 commits January 4, 2026 09:25

mzyfree force-pushed the perf/concurrency-memory-optimization branch from 01fb7ee to 47bc688 Compare January 6, 2026 05:13

review: update code style

340013d

Merge upstream/main into perf/concurrency-memory-optimization

6a6910a

Resolve conflicts in async_configs and docker server while keeping avoid_ads/avoid_css and upstream init_scripts, and retaining upstream URL scheme validation.

ntohidi changed the base branch from main to develop February 25, 2026 01:27

unclecode added a commit that referenced this pull request Feb 25, 2026

docs: add mzyfree to contributors for PR #1689

8f2c2e1

unclecode closed this Mar 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(docker): optimize concurrency performance and memory management#1689

feat(docker): optimize concurrency performance and memory management#1689
mzyfree wants to merge 4 commits intounclecode:developfrom
mzyfree:perf/concurrency-memory-optimization

mzyfree commented Jan 4, 2026 •

edited

Loading

Uh oh!

mzyfree commented Jan 8, 2026 •

edited

Loading

Uh oh!

chrizzly2309 commented Jan 9, 2026

Uh oh!

AlbertInRC commented Jan 13, 2026

Uh oh!

AlbertInRC commented Jan 13, 2026

Uh oh!

AlbertInRC commented Jan 15, 2026

Uh oh!

AlbertInRC commented Jan 30, 2026

Uh oh!

AlbertInRC commented Feb 24, 2026

Uh oh!

unclecode commented Feb 24, 2026

Uh oh!

mzyfree commented Feb 24, 2026

Uh oh!

unclecode commented Feb 25, 2026 •

edited

Loading

Uh oh!

unclecode commented Mar 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

mzyfree commented Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Core Enhancements:

New Configuration Options

Engine Layer (BrowserConfig)

Docker Layer (Environment Variables)

List of files changed and why

How Has This Been Tested?

Checklist:

Stress test performance

QPS increased by 40%

Resource with no OOM

Uh oh!

mzyfree commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chrizzly2309 commented Jan 9, 2026

Uh oh!

AlbertInRC commented Jan 13, 2026

Uh oh!

AlbertInRC commented Jan 13, 2026

Uh oh!

AlbertInRC commented Jan 15, 2026

Uh oh!

AlbertInRC commented Jan 30, 2026

Uh oh!

AlbertInRC commented Feb 24, 2026

Uh oh!

unclecode commented Feb 24, 2026

Uh oh!

mzyfree commented Feb 24, 2026

Uh oh!

unclecode commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

unclecode commented Mar 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mzyfree commented Jan 4, 2026 •

edited

Loading

Engine Layer (`BrowserConfig`)

mzyfree commented Jan 8, 2026 •

edited

Loading

unclecode commented Feb 25, 2026 •

edited

Loading